home *** CD-ROM | disk | FTP | other *** search
- Some notes about 386video....
-
- IT'S LIMITED
- The current 386video module does not exploit the full power
- of XVD drivers.
- I coded 386video with a generic interface (the interface won't change
- in the future releases), but with the underlying code focused for
- 320x200 256 colors screen modes only.
- If you want to exploit the full power of XVD drivers you'll have to
- enhance 386video yourself (sorry i'm too much under pressure to do it now).
-
- RAM BUFFERING and the VRAM BOTTLENECK
-
- I use RAM BUFFERING to render each frame, first i compose the next graphic
- image into SYSTEM RAM and then blit it to DISPLAY RAM.
- The main reason to use RAM BUFFERING is that display ram is usually SLOWER
- than system ram, and usually display ram has lower i/o badwidth available
- to the processor.
- What's more, it is faster to cache system ram than display ram
- (again the i/o bottleneck).
- So if you have to access multiple times the display frame you are composing...
- ... it is better to render it on faster system ram and then copy it once
- to vram.
- Another reason to use ram buffering is that if you have only one visible
- display page you can't use the double display page trick, and when you
- update the display you have to be the fastest you can be.
- If you have only to copy the buffered image you are sure to use the fastest
- update method.
-
- On some systems, a big cache and a good bus interface makes vram
- look as fast as system ram, but your program has to run even on
- "weak" systems with vram bandwidth bottlenecks.
-
- DELTA BLITTING:
-
- Plain ram buffering works well if system ram is a lot faster than vram
- AND you don't have bus bandwidth bottlenecks.
- Plain ISA bus has a 16bit width (8bit for some cards)
- and a standard 8Mhz clock, this translates to a 1..2 Mbyte/sec bandwidth
- when copying from memory to memory, while a plain 386/25 has at least a
- 4..8 Mbyte/sec available bandwidth when accessing system ram.
- Some systems support some "speeded up ISA" bus (mine can run ISA at
- 12Mhz, others support internal buffering and "fast cycling") but
- even if you run on a 120 MIPS Pentium, with an ISA (8bit or 16bit) card
- you can't go far.
-
- The answer to these bottlenecks is DELTA BLITTING, instead of
- blitting all the display page, blit only the differences between
- the previous display frame and the next.
-
- Usually there are strong correlations between the image already displayed
- and the next still into system ram, so it is possible to boost
- animation speed a lot.
-
- The speed of my 'test program' was 23..24 Frames Per Second (FPS)
- with "simple ram buffering" while it skyrocketed to 56..60 FPS
- when turning on delta blitting.
-
- Of course your mileage may vary, it depends on the programmer to
- set up the appropriate "delta clipping" methods depending on what kind
- of animation you perform.
-
- Maybe you are thinking "HA! Now there are VL-BUS and PCI
- i don't need to program for fucking old ISA ...".
- Well, given the current trend, the VL-BUS and PCI buses you think are fast now
- are gonna be a bottleneck to a 300Mhz SSPHARK
- (SuperScalar Processor from Hell with Advanced Risky Killpower ;) )
- driving a 4096x4096 24bit color mode on a 100 inches display.
-
- HE TOUCHMAP
-
- The bitmap you render on system ram contains the image you want on screen,
- if you want to "blit only the differences" you have to store
- some information to "remember" from a frame to another what's changed.
- I call this structure a TOUCHMAP, every time you modify (touch) the bitmap
- store some info on the touchmap, so when you will have to delta-blit
- you will use the touchmap to see where are the altered pixels to blit .
-
- If you want speed, the "touchmap composition" has to be an algorithm
- of O(n) computational complexity (linear) and the overhead has to be
- the least possible.
-
- I evaluated various touchmapping methods, here comes the one i choosed:
-
- A bit-equivalent mask of the display bitmap where
- ONE BIT in the TOUCHMAP
- "marks" FOUR PIXELS (A DWORD) on the BITMAP.
-
- The bitmap/touchmap size ratio is around 32/1 (quite good)
- and the touchbits are packed into DWORDS (so, when you "touch", you use
- the massive speed of 32bit transferts, instead of slow bit-by-bit things).
- Using loop unrolling you can pump data to the video card at full speed.
-
- Nota bene:
-
- The touchmap is an ARRAY OF DWORDS, each bit into a dword is a flag
- for four consecutive pixels.
- The touchmap has as many rows as the logical display screen height in pixels
- and as many BITS as logical_display_screen_width/4 in pixels
- rounded up to a 32 multiple (so the lenght of a touchmap row can be
- expressed in dwords).
- When you manipulate the touchmap ALWAYS USE DWORD ACCESS, this is
- an absolute need to minimize "touchmap updating" overhead.
- I've tested various methods, the "dword sized" touchmap is faster
- than anything else on a 386 class processor animating lots of independent
- objects.
- This is due to the 32 to 1 ratio between actual pixel data and touchmap data
- and to the "always aligned dword" access you can use with this method.
- To further reduce memory usage i use a self-compiling "loop unroller"
- this way, instead of checking each bit i check a byte and call
- the appropriate "unrolled loop" for it.
- With this method i perform only one compare and call
- instead of eight compare and branch (this keeps my 386 happy
- because the less the jumps the more the pipeline stays filled and running)
-
- WHY 256 COLORS ONLY
-
- The 8bit/pixel modes are the less processor intensive you can find
- (this means lots of speed), 256 colors are good enough for most games.
- You can blit 4 dots in a single memory access, mask quickly
- and implement fast compression/decompression methods if needed.
- If you think 16 or 24 bit/pixel modes are nicer to look at, you are right
- but the most common display cards use dynamic ram and this means the
- higher the video refresh bandwidth and the lower the cpu/blitter bandwidth.
- What's more, i want things capable to run in 4Mbyte, having bitmaps
- with two to four times the size of plain 256 color ones is no good.
-
-
-
-
- For further info and explanations look into
- 386video.asm, 386video.inc, driver.txt, xvd.txt, makefile
- and the XVD driver sources (for example chips450.asm)
-
- Ciao,
- Lorenzo Micheletto knight@maya.dei.unipd.it
-
-
-